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ABSTRACT 

We analyze the dependence of the membership probabilities obtained from kinematical variables on the radius of the field of view 
around open clusters (the sampling radius, R s ), From simulated data, we show that the best discrimination between cluster members 
and non-members is obtained when the sampling radius is very close to the cluster radius. At higher R s values more field stars tend 
to be erroneously assigned as cluster members. From real data of two open clusters (NGC 2323 and NGC 2311) we obtain that the 
number of identified cluster members always increases with increasing R s . However, there is a threshold R s value above which the 
identified cluster members are severely contaminated by field stars and the effectiveness of membership determination is relatively 
small. This optimal sampling radius is =s 14 arcmin for NGC 2323 and a 13 arcmin for NGC 2311. We discuss the reasons for 
such behavior and the relationship between cluster radius and optimal sampling radius. We suggest that, independently of the method 
used to estimate membership probabilities, several tests using different sampling radius should be performed in order to evaluate the 
existence of possible biases. 
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1. Introduction 



open clusters and associations: general - open clusters and associations: individual: NGC 23 1 1 , 



The large astrometric catalogues derived from surveys covering 
very wide areas of the sky are allowing the systematic searching 
of new star systems (see, for example, lLopez-C orredoira et all 
1998tlHoogerwerf& Aguilar, 1999; K azakevich & Orlovil2002t 
Mvullvari etall 120031: [Caballero & DinisL l2008t IZhao et all 
2009, nd references therein). The searching process is based 
on the detection of well defined structures in some subsets of 
the phase space. The presence of both spatial density peaks and 
proper motion peaks indicates the existence of star clusters; 
peaks visible only in the proper motion distributions suggest 
the existence of moving groups; whereas more spread and less 
dense velocity-position correlated structures could be associated 
to stellar streams. Once these structures have been detected, 
the next step is to search for identify possible members of the 
star system. For the particular case of open clusters, the most 
often used procedure t o select possib le cluster members is the 
algorithm designed by Sander| (119711). This algorithm is b ased 
on a former model proposed by Vasilevskis et al.l dl958l) for 
the proper motion distribution. The model assumes that cluster 
members and field stars are distributed according to circular 
and elliptical bivariate normal distributions, respectively. The 
Sanders' algorithm, or some variation or refinement of it, has 
been and still is widely used to estimate cluster memberships 
either as the only method or as part of a more complet treatment 
that includes, for example, spatial and/or ph otometric criteria . 
Some recent repre se ntative references are | Wu et al.| (|2 002); 
iJilinski et all [2 003); Balaguer-N unez et alJ J2004l); iDias et al. l 
(200o); iRraus & Hillenbrand! (120071) : IWiramihardia et all 

With the advent of large catalogues and databases available 
via internet and future surveys such as the forthcoming Gaia 



mission of ES A, the interest in developing and applying fully au- 
tomated techniques is increasing among the astronomical com- 
munity. However, special care must be taken to avoid obtaining 
biased results. In this work we will show that the results yielded 
when using the Sanders' algorithm significantly depend on the 
choice of the size of the field of view surrounding the cluster. 
So, once detected a possible open cluster, it is natural to ask 
what area of the sky should be sampled in order to get the most 
reliable membership determinations. It is equally important to 
ask about the robustness of used methodology, i.e. how does the 
solution change when the sampled area is varied? Here we ex- 
plore these subjects by using both simulated and real data. In 
Section|2]we briefly present the method used to determine mem- 
berships and describe the simulations that we performed to ana- 
lyze the expected behavior. The results of applying the Sanders' 
algorithm on the simulated data are discussed in Section|3] After 
this, in Section|4]we use real astrometric data of two open clus- 
ters (NGC 2323 and NGC 231 1) to evaluate the performance of 
the algorithm. We discuss strategies to estimate the optimal sam- 
pling radius, i.e. the maximum radius beyond which the identi- 
fied cluster members are expected to be severely contaminated 
by field stars. The main results of the present work are summa- 
rized in Section 



2. Description of the method 

2.1. Membership determination 



The key point of the membership discrimination method is the 
assumption that the distribution of observed proper motions 
(jj. x ,/j.y) can be described by means of two bivariate normal dis- 
tributions, one circular for the cluster and one elliptical for the 
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field (IVasilevskis et al.L[l958l) . Let <£ c and O/ be the cluster and 
field probability density functions, respectively. Then, 



and 



W eXP 1-2 



V-x - Hx 
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(1) 



(2) 



where (ji x>c , Mv,e) is the cluster distribution centroid with standard 
deviation cr c , Qi x j,fx y j) is the field centroid with standard devi- 
ations o~ x j and (r y j, and p is the correlation coefficient of field 
stars. The probability density function for the whole sample is 
simply 



(3) 



n c and n f being the normalized numbers of cluster and field stars, 
respectively. For obtaining the unknown parameters (centroids, 
standard deviations, numbers of members and non-members) an 
iterative p rocedure is use d by applying the maximum likelihood 
pri nciple (ISanders|, 197 lb. Here w e use the algorithm proposed 
by Cabrera-Cano & Alfaro (1985), which first detects and re- 
moves outliers that can produce unrealistic solutions, and then 
uses a more robust and efficient iterative procedure for the model 
parameter estimation. Once these parameters are known, then 
membership probability of the i-th stars can be calculated di- 
rectly as 



MO = 



» c i ) c (0 
o(0 



(4) 



2.2. Simulations 



Let us consider a cluster with a given radius R c . We are defin- 
ing "cluster radius" as the radius of the smallest circle that can 
completely enclose its stars. In real situations R c is essentially an 
unknown quantity that has to be estimated a posteriori, but here 
its value is known and kept constant within each simulation. The 
total number of stars belonging to the cluster is denoted by N c , max 
and the number of field stars lying exactly within the same sky 
area of the cluster is A^/ cr! -. The independent variable is the radius 
of the field encircling the cluster. This radius might represent the 
radius of the field in which the observations are made or the field 
around the cluster extracted from an astrometric catalogue. We 
call this variable the sampling radius R s , which can be larger or 
smaller than the cluster radius R c . 

The numbers of cluster stars and field stars to be simulated 
are represented by N c ^ m and A^/; sim , respectively. Obviosuly, the 
number of the number of cluster stars and field stars within the 
field of view depend on the size of this field, that is, both N c ,sim 
and Nf^im are functions of R s . If the field stars distribute nearly 
uniformly in space then Nf $S i m should increase as the sampling 
radius increases as 



Nf, sim (R s )=Nf, cri (R s /R c ) 2 



(5) 



The rate at which N c , s im increases with R s depends instead on the 
radial profile of the surface density of cluster stars (£ c ,s/m)- For 
simplicity, let us assume that the surface density at r is given by 
JCabaUerdl2008h 
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(6) 



with the index 5 < 2. For the extreme case 5 = 2, we have 
£c,jim = N c ,max/(nRc) - constant. Integrating equation (O we 
obtain the number of cluster stars within a given sampling radius 
(for R s < R c ), 



(7) 



Negative 6 values make no sense, so this approach is limited 
to the range < 6 < 2. The role of the parameter 5 is just to 
control how fast N c< „i m increases as R s increases. Thus, the exact 
functional form is not needed to be known as long as we are able 
to simulate either completely flat (S = 2) or extremely peaked 
(6 0) density profiles. 

To perform the simulations we distribute A^; J/m field stars and 
N c ,si m cluster stars according to bivariate gaussian distributions 
in the proper motion space {n x ,n y ). The routine "g asdev" from 
the Numerical Recipes package (Press et ail 11992 ) is used for 
generating normally distributed random numbers. The fields are 
centered at (0,0) with standard deviations of cr x j = <r y j = erf. 
The tests performed using elliptical (rather than circular) dis- 
tributions for the field stars yielded essentially the same results 
and trends. The clusters are centered at Ou x ,c,yUy,e) and have stan- 
dard deviations cr xc - cr y iC = o~ c . Thus, for a given sampling 
radius R s and according to equation (0, we randomly gener- 
ate Nj\ S i m field stars that follow a bivariate normal distribution 
in the proper motion space. For the cluster, we generate iV C]M - m 
stars according to equation © when R s < R c and we generate 
N c ,sim = N c ,max - constant stars when R s > R c . The three free 
parameters, excluding those describing the gaussians, are the to- 
tal number of stars in the cluster (N c . max ), the number of field 
stars falling within the cluster area (Nf >cr i), and the cluster star 
density profile (6). For each set of parameters we have performed 
100 simulations and we have calculated the average values of the 
studied quantities with their corresponding standard deviations. 



3. Results from simulations 

For each simulation, we have calculated cluster membership 
probabilities using the method described in Section|2T| We have 
performed several simulations varying the input parameters (the 
number of stars in the cluster and in the field, centroid distance 
in the proper motion space, and standard deviations) within rea- 
sonable ranges. Except for minor differences, such as that the er- 
ror bars are higher when cluster and field distributions are more 
similar, all the results and trends remained essentially identical 
to those described in this section. Let us start showing how the 
algorithm works. In FigureQ]we can see an example of a simula- 
tion of a cluster of 200 stars that it has been adequately sampled 
with R s = I.IR C The right panel clearly shows the occasional 
but inevitable "failures" of the method. First, cluster stars falling 
in the tails of their own distribution may not be recognized as 
members. Second, field stars falling by chance below the cluster 
distribution may be selected as probable members. 

What would happen if we select a larger field? In order to 
address this point we have calculated membership probabilities 
as a function of the sampling radius. Here we are considering 
as cluster members those stars having membership probabilities 
> 0.5 in a Bayesian sense. We have done several tests by using 
different selection criteria and, as expected, the number of as- 
signed members depends on it, but the main results and trends 
presented here remained unchanged. Figure|2]shows the number 
of stars classified as members (we will denote it by N c ) or non- 
members (Nf) by the algorithm as a function of the sampling 
radius. For these particular simulations the number of assigned 
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Fig. 1. Proper motion for the stars of a random simulation with N c ^ max = Nf tCr j = 200, 6-2, and R s /R c = 1-1 (see text for details 
of the meaning of each of these quantities). Left panel shows the distribution for all the 442 simulated stars. Red circles are the 
field stars centered at (0,0) with cry = 5 and blue circles are the 200 cluster stars centered at (1,0) with <x c = 1. Right panel is a 
magnification of the central region in which we have marked with circles the stars whose resulting cluster membership probabilities 
are higher than 0.5 according to the algorithm used. 




Fig. 2. Calculated number of field and cluster stars as a func- 
tion of the sampling radius in units of the cluster radius, R s /R c , 
for simulations with the same set of parameters as Figure Q] (a) 
Simulation with peaked density profile (6 = 0.5), assigned mem- 
bers are indicated by squares connected by lines, (b) Simulation 
with fiat density profile {5 = 2), members are indicated by cir- 
cles connected by lines. Assigned field stars are indicated by ver- 
tical bars connected by lines, the length of the bars indicating 
one standard deviation. The real numbers of simulated stars are 
shown by dashed lines (cluster) and dotted lines (field). 



members N c is always higher than the real number of cluster 
stars. Most of the cluster stars are well identified but, as men- 
tioned before, field stars falling below the cluster distribution 
are also considered as members. For the same reason the num- 
ber of field stars is always smaller than its expected value. For 
R s < R c (subsampled cluster), N c increases with R s because ob- 
viously the number of cluster stars in the sample increases as R s 
increases. The rate at which this occurs depends on the cluster 
density profile, that for the simulations with 5 — 2 in Figure|2]is 
exacly the same as for the field (homogeneous distribution). For 
R s > R c we observe a change in the behavior of N c - In this case 
we do not include new cluster stars in the sample as R s increases, 
and N c increases slightly because of the new field stars that erro- 



Fig. 3. Calculated fraction of cluster stars as a function of the 
sampling radius for the same simulations as in Figure [2] The 
real (simulated) values are shown by dashed lines. 



neously are classified as possible members. On the other hand, 
field stars always increase at a rate roughly proportional to R 2 S . It 
is easy to see that, in general, the fraction of cluster stars (shown 
in Figure|3]l should be a decreasing function of R s for any cluster 
with 6 < 2. Only for the extreme case of homogeneous clusters 
the fraction of cluster stars remains constant with R s for R s < R c . 

Figures|2]and|3]show the number of stars classified as mem- 
bers, but we do not know whether this classification is actually 
well done. In order to quantify the correctness of the result we 
define the matching fraction of the cluster M c as the net pro- 
portion of cluster stars that are well classified. If N a k is the total 
number of cluster stars correctly classified as members minus the 
number of cluster stars incorrectly classified as non-members, 
then M c = N k/N c<max . M c can be a negative number if the num- 
ber of misclassifications is higher than the number of correct 
classifications and M c is exactly 1 only when the algorithm clas- 
sifies correctly all the stars of the cluster. In Figure [4] we see 
that the highest M c value occurs precisely when the sampling 
radius equals the cluster radius. At smaller sampling radii the 
matching fraction of the cluster obviously decreases because the 
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Fig. 4. Matching fraction of the cluster (see text) as a function of 
the sampling radius for the same simulations as in Figure[2] The 
error bars are of the order of the symbol sizes but they are not 
shown for clarity. 

cluster is being subsampled. Interestingly, the matching fraction 
is also smaller at R s > R c , but the reason in this case is that 
more field stars are being erroneously assigned to the cluster as 
R s increases. The best classification is obtained when the sam- 
pling radius is very close to the cluster radius although, as ex- 
pected, even in this case the matching fraction does not reach 
its maximum value M c = 1 . However, the matching fraction is 
relatively high (M c = 0.83) at R s = R c and decreases slowly to 
0.71 at R s - l.5R c . Moreover, the behaviors of N c and Nt with 
R s are very similar to the expected ones (Figures @and|3]l. This 
is because both cluster and field stars were simulated following 
perfect normal distributions and, therefore, both populations can 
be well detected by the algorithm since it assumes the same kind 
of underlying distribution. When using real data the situation be- 
comes more complex, as discussed in the next section. 

4. Results using real data 

We use the CdC-SF Catalogue dVicente et aU H009), an astro- 
metric catalogue with a mean precision in the proper motions 
of 2.0 mas/yr (1.2 mas/yr for well measured stars, typically 
V < 14). Given the position of a known open cluster, we ex- 
tract circular fields of varying radius centred on it and then we 
calculate membership probabilities by using the same algorithm 
as in Section [3] Here we analyze two open clusters that are in- 
cluded in the area covered by this catalogue: NGC 2323 (M 50) 
and NGC 2311. In order to minimize even more the influence 
of possible outliers on our results we further restrict the sample 
to |yiz| < 20 mas/yr. The number of probable members A^ c , i.e. 
stars having membership probabilities higher than 0.5, is shown 
in Figure [5] as a function of the sampling radius. In general, A^ c 
always increases with increasing R s and there are no relatively 
flat regions analogous to those observed in Figure|2]for R s > R c . 
Without a previous knowledge of the approximate value of the 
cluster radius, how can we determine which is the most reliable 
result? This is not a trivial question given the large uncertain- 
ties involved in the estimation or definition of the cluster radius 
(see discussion in Section [4~2l . For example, the radius of the to- 
tal extent of NGC 2323 estimated by d ifferent authors ha s been 
varying over the last years: 10 arcmin dClaria et aUll998[). 16. 7 
arcmin (Nilaks hi et all 12002), 15 arcmin (iKalirai et al.Ll2003l) . 




1 1 ' 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 ' 1 1— 
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Fig. 5. Number of cluster stars N c as a function of the sampling 
radius R s in arcmin for the open clusters NGC 2323 (squares 
connected by lines) and NGC 231 1 (circles connected by lines). 
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Fig. 6. Fraction of cluster stars as a function of the sampling ra- 
dius for NGC 2323 (squares connected by lines) and NGC 2311 
(circles connected by lines). Vertical arrows indicate the optimal 
sampling radii (see text). 



22.2 a rcmin dKharchenk o et al. [ |2005l) . 17 arcmin dSharma et"aL , 
2006, using their own optical data) or 22 arcmin ( Sharma et al., 
120061 using 2MASS data). Our calculations yield N c = 198 
probable members in a field of radius R s = 17 arcmin, but this 
number increases to N c = 336 for R s = 22 arcmin. This means 
that there could be more than 100 undetected members if we use 
R s = 17 arcmin and the cluster radius is actually R c = 22 ar- 
cmin or, on the contrary, more than 100 spurious members if we 
use R s = 22 arcmin and R c = 17 arcmin. The fraction of cluster 
members is shown in Figure [6] The trend in which N C /(N C + Nf) 
decreases with R s is qualitatively consistent with the expected 
behavior (Figure [3). However, there is a R s value from which 
the fraction of members increases as R s increases and, as men- 
tioned in the previous section, this behavior is possible only if 
N c increases faster than Nf does (i.e. at a rate higher than ~ R 2 ). 
The only way this could happen is if the algorithm is introducing 
many spurious members as R s increases. In other words, there is 
a critical R s value above which a significant number of spurious 
members are erron eously included as part of the cluster (see also 
Piatti et al.L l2009h . Here we call this critical value the optimal 
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Fig. 7. Estimated standard deviations as a function of the sam- 
pling radius for the clusters NGC 2323 (squares connected by 
lines) and NGC 2311 (circles connected by lines). The bars in- 
dicate the uncertainties obtained from bootstrapping. 



Fig. 8. Effectiveness of membership determination (see 
Equation D as a function of the sampling radius for the 
open cluster NGC 2323 (open squares connected by solid lines) 
and for simulations using parameter values corresponding to 
those obtained for NGC 2323 (dashed lines). 



sampling radius, R S4 ,pi, and obviously it is not recommended to 
use a sampling radius larger than this value. From Figure [6] we 
get R s , op t — 14 arcmin for NGC 2323 and R St0pl ^13 arcmin 
for NGC 2311, but we have to point out that these values are 
valid for the data we are using and, in principle, they cannot be 
extrapolated to other data sets. 

The main reason behind the behavior observed in Figure [6] 
lies in the disagreement between the assumed and the "true" un- 
derlying distributions of proper motion of field stars. A circular 
normal bivariate function is a good representation of the cluster 
probability density function (PDF), the standard deviation being 
the result of observational errors that prevent the intrinsic ve- 
locity dispersion of the cluster from being completely resolved. 
However, it is known that an elliptical normal bivariate function 
is not always th e best model for the field PDF (see discussions on 
this s u bject in | Cabr era-Cano & Alfar q 1 19901: lU ribe & Brieva, 
1994 iBalaguer-Nunez et all l2004t ISanchez & AlfaroL 12009: 
Griv et all 120091) . The combination of several factors, such as 
galactic differential rotation or peculiar motions, may affect 
the field star distribution which usually tends to exhibit non- 
gaussian tails. Non-parametric models, which make no a pri- 
ori assumptions about the cluster or field star distributions, 
have been introduced and used to overcome th is problem (cf. 
ICabrera-Caflo & AlfaroL 119901: IChen et al.L 119971) . It is interest- 
ing to note that both the classical parametric and non-parametric 
methods agree reasonably well with each other only for the 
cases of nearly gaussian field distributions (see Figure 5 in 
Sanc hez & Alfarol 120091) . When the number of field stars in- 
creases and the algorithm tries to fit a gaussian function to the 
PDF, the fit tends to produce a wider and flatter function. As a 
consequence, the membership probabilities (defined as the ratio 
of the cluster to the total proper motion distribution function) 
increases and therefore the number of assigned members also 
increases. This effect is magnified when the cluster distribution 
becomes "contaminated" by many field stars, because then the 
standard deviation of the cluster tends to increases with the con- 
sequent increasing of number of spurious members. The stan- 
dard deviations estimated for the two clusters under considera- 
tion are shown in Figure [7] The error bars were estimated us- 
ing bootstrap techniques: the calculation is repeated on a se- 
ries of 100 random resamplings of the data and the standard 



deviation of the obtained set of values is taken as the associ- 
ated uncertainty. The standard deviations remain nearly constant 
(cr e 1 .4 - 1 .6 for NGC 23 1 1 and cr c ^0.9-1 .0 for NGC 2323) 
in the region in which R s < R St0pt (see also Figure|6]l. This is the 
expected behavior because, in principle, cr c should not depend 
on the sample size. However, above the optimal sampling radius 
we can see a gradual increase in cr c due to the effect mentioned 
previously. 

4.1. Effectiveness of membership determination 

It is not possible in practice to quantify the degree of correla- 
tion between identified and true cluster members, such as the 
matching fraction in Figure |4] Instead, we can use the concept 
of effectiveness of membership d etermination which is set as 
dTian et all [l998t IWu et all l2002l) 

5£i/<02£i[i-/<9] 

where p(i) is the membership probability of the z'-th star and N 
is the sample size. This index measures how effective the mem- 
bership determination is in the sense of measuring the separa- 
tion between field and cluster populations in the probability his- 
togram. The higher the index E, the more effective the mem- 
bership determination. The maximum E value is obtained when 
there are two perfectly separated populations of N c stars with 
membership probabilities p(i) = 1 and Nj stars with p(j) = 0. 
Figure [8] shows E for the open cluster NGC 2323 as a func- 
tion of the sampling radius. For the sake of comparison we 
also show the result for simulations using the same parameters 
as those for NGC 2323. Our most reliable estimation for this 
cluster (R s = R s , pt - 14 arcmin) yielded the following values 
for the proper motions (in mas/yr): \i xfi = 1.09, fx y>c = 1.13, 
o- x , c = (Ty,c = 1.01, fix,/ = +0.77, Hyj = -2.54, cr xJ = 6.41, 
and <Tyj = 5.84. According to the result shown in Figure [9] (next 
section), we assume R c = 20 arcmin and 6-1.1 for the clus- 
ter. Additionally, we choose N c , max = 250 and A^/, cr , = 500 in 
order to get the measured values N c = 147 and Nf = 231 at 
R s = 14 arcmin. The superimposed dashed lines in Figure|8]are 
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the average values (and their standard deviations) for these sim- 
ulations. The simulated E value remains fairly constant (within 
the uncertainties) as R s increases until the value R s ^ R c = 20 
arcmin, beyond which it decreases at a relatively high rate. For 
NGC 2323 we see that E begins to decrease more rapidly as R s 
increases just beyond R s R s , opt = 14 arcmin. The best separa- 
tions between cluster and field stars and the agreement with the 
simulations are achieved in the range 10 < R s < 14 arcmin. 

4.2. Cluster radius and optimal sampling radius 

Basically what we are saying is that, at least when using only 
kinematical criteria, the sample size can substantially alter the 
results obtained (the memberships and the rest of the properties 
derived from there). Thus, the strategy of choosing a field large 
enough to be sure of covering more than the whole cluster has 
to be taken with extreme caution, especially in dense star fields. 
According to our simulations (Section [3]), the best membership 
estimation is achieved when R s R c . This would seem an ob- 
vious result, given that for R s < R c the cluster is subsampled 
whereas for R s > R c the probability of contamination by field 
stars is increased. The important point here is: how well can 
we know the cluster radius before estimating memberships? It 
is difficult to determine precisely the radius of a cluster because 
the definition of radius is ambiguous itself, since star clusters 
have no well defined natural boundaries. In this work we have 
used the usual definition of R c as the radius of the circle con- 
taining all the cluster members. Most of the "geometric" defini- 
tions tend to overestimate the actual size, espec ially for irregu- 
larly shaped clusters (Schmeia & Klessen, 2006). But this is not 
the main problem. The problem is that the independent estima- 
tions of cluster radii available in the literature usually exhibit 
significant differences and uncertainties. Angular sizes listed in 
catalogues as WebdiQ were compiled from older references (e.g. 
iLvngal fl987l) in which most of the apparent diameters were es- 
timated from visual inspection. According to Webda R c — 7 ar- 
cmin for N GC 2323, but in the last years this value has been 
triplicated dSharma et al.L 120061) . As mentioned above, it is an 
usual practice to choose a field larger than the apparent area 
covered by the cluster (taken from the literature) for estimat- 
ing membership probabilities. But, at least when applying the 
Sanders' method, assigned members will spread throughout the 
whole selected area because of the contamination by field stars. 
It is probably not coincidental that this is the case, for exam- 
ple, f or the probable members in the Dias catalogue dDias et al.L 
2002). How reliable are all the memberships that have been de- 
rived from proper motions? It depends on the "real" R c values. 
Thus, again, the ideal situation would be some kind of robust 
estimation of the radius. 

A commonly used procedure to determine (or define) the 
cluster radius is based on the analysis of the projected radial 
density profile. Usually, some particular analytical function (for 
example, a King-like model) is fitted to the density profile and 
the cluster radius is extracted from this fit. The last study dealing 
with a systematic determination of cluster sizes based on objec- 
tiv e and uniform estimations of radial density profiles was done 
by iKharchenko et alj d2004l) . One limitation of this method is 
the sensitivity of the fit to small variations in the distribution of 
stars, especially for poorly populated open cluster. The most re- 
liable fits are obtained using only the cluster members, but then 
we are confronting again the problem of membership determi- 
nation. As an example, let us consider Figure [9] which compares 

1 http://www.univie.ac.at/webda 
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Fig. 9. Radial density profiles for the cluster NGC 2323 calcu- 
lated for the cases R s = 14 arcmin (solid circles) and R s = 25 
arcmin (open circles). Lines show the best fits for functions of 
the form ~ r 15 2 (see Equation The solid line is for the case 
R s = 14 arcmin for which 6 - 1.7, and the dashed line corre- 
sponds to R s = 25 arcmin for which 5 - 1.2. 



the density profiles obtained for the open cluster NGC 2323 
for two different sampling radii: R s = R Sy0pt = 14 arcmin and 
R s = 25 arcmin. According to our results (section |4]i our most 
reliable estimation is achieved when R s = R s , t ,pt- For this case, 
the best least squares fit to a power law function suggests a 
cluster radius in the range ~ 20 - 25 arcmin. However, if we 
take a sample of size R s = 25 arcmin the contamination by 
field stars tends to produce an overestimation of the star density 
and both the index of the power law and the estimated cluster 
radius change notoriously (see Figure [9]). But the main draw- 
back of this method is that simple analytical fits are not always 
a good representation for the stars distribution in open clusters 
(San chez & Alfaroll2009l) . The radius defined through a fit to a 
density profile may be useful in analyzing and comparing the 
properties of several clusters systematically, but great care must 
be taken when using these model-dependent definition to esti- 
mate the "true" cluster radius. In fact, the point where the fitted 
star density equals the background (or drops to zero) does not 
even necessarily agree with the outer boundary of an open clus- 
ter. In principle, new-born stars in a young cluster spread out 
through the region able to collapse gravitationally to form them. 
At certain distance from the high density peak in the molecular 
cloud the required conditions are not fulfilled anymore and the 
star formation efficiency may decrease abruptly. So, a radial star 
density distribution which decays smoothly to R c may not be 
always suitable, especially for compact and/or very young star 
clusters. Moreover, if the clusters exhibit some degree of sub- 
structure this kind of proc edure yields totally unrealistic results 
dSanchez & Alfaro l l2009|). Young embe dded clusters often show 
hierarchical structure dElmegreenl 120091) . so that these methods 
cannot in principle be applied to embedded clusters but only to 
centrally concentrated open clusters. 

Obviously, any reliable estimation of the cluster radius ul- 
timately depends on the membership determination. Field star 
contamination may affect the determination of R c , and what we 
are showing in this work is that this contamination can become a 
severe problem if it is not taken into consideration. Furthermore, 
even though cluster and field populations were well separated, 
the estimated radius would depend on the limit magnitude if, 
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for instance, there was mass segregation. This kind of problems 
is particularly relevant for the development of automated tech- 
niques in which it is necessary to establish objective criteria 
to decide the size of the sample to be processed. What we are 
proposing here is to apply any suggested method to several sam- 
ple sizes R s and analyse the behavior obtained. It is difficult to 
give simple rules for evaluating this behavior because the results 
will depend directly on both the membership determination al- 
gorithm and the input data. However, for the method considered 
in this work, based on two underlying gaussian populations, the 
basic procedure can be outlined as follows. 

1 . An upper limit for R s can be previously estimated by fitting 
the spatial star density to, for example, a King profile. The 
estimated tidal radius (or, in order to be confident, twice its 
value) may be considered an upper limit of the optimal sam- 
pling radius and would define the range of R s values to be 
scanned. 

2. For each R s value, cluster memberships and all the relevant 
quantities (numbers of cluster stars and field stars, centroids 
with their standard deviations, effectiveness of membership 
determination) have to be estimated. 

3. The next step is to plot the number of cluster members N c as 
a function of the sampling radius R s . If the membership de- 
termination works reasonably well, meaning that it presents 
little contamination by field stars, then we would observe a 
behavior as that seen in Figure [2] N c increasing as R s in- 
creases until some point (just when R s = R c ) and then N c 
remaining approximately constant for higher R s values (or 
increasing at a much slower rate). In this way, we have a 
method to estimate the cluster size directly from the data and 
the membership criteria without making any additional as- 
sumptions. The optimal sampling radius at which we get the 
best membership estimation is precisely R St0pt = R c (Fig. |4j 

4. If the parametric model does not adequately describe the real 
data and/or if the internal noise has not a simple structure, 
then the behavior of the estimated parameters with R s would 
be different from the expected one. If this were the case we 
should plot the fraction of members N C /(N C + Nf) versus R s , 
where we should be able to identify the optimal sampling ra- 
dius R s , pt as the minimum in this plot (Fig [6]). In absence 
of more accurate information, this value would correspond 
to the radius for which the membership classification is the 
most reliable (with such a method in a given astrometric cat- 
alogue). 

5. Our experience indicates that the properties derived from the 
Sanders' method tend to exhibit some noise and it is not 
always easy to identify the exact position of specific fea- 
tures (as the minimum in the N C /(N C + Nf) versus R s plot). 
Some complementary strategies may be useful in identify- 
ing or confirming the optimal sampling radius. First, one can 
deal with the variation of the proper motion standard devi- 
ation with radius. The dispersion of the cluster proper mo- 
tions should display a change of slope at radius close to its 
optimal value (Fig. |7). Second, the maximum of the effec- 
tiveness of membership determination should also be around 
Rs,o P t (Fig©. 

The strategy proposed in this work, i.e. to estimate and anal- 
yse cluster memberships as a function of R s , should in princi- 
ple allow for the identification of the optimal sampling radius. 
However, we would like to emphasize that it may not always 
be possible (or at least not always unambiguous) to determine 
Rs,opt in the way described above. For instance, for very peaked 
cluster density profiles the change in N c at R s = R c may be not 



pronounced enough for being easily detected (e.g., Fig. |2}i). In 
spite of this, it still seems appropriate and useful to perform this 
kind of tests before any further analysis. 

5. Conclusions 

We have evaluated the performance of the common l y used 
Sanders' method (IVasilevskis et all 119581: ISandersL 11971b 
ICabrera-Cano & Alfarolll985h in the determination of star clus- 
ter memberships. In general, the results depend on the radius 
of the field containing the sampled cluster (the sampling radius, 
R s ). The main reason for this dependence lies in the differences 
between the assumed gaussian and the true underlying proper 
motion distributions. The contamination of cluster members by 
field stars increases as the sampling radius increases. The rate 
at which this effect occurs depends on the intrinsic character- 
istics of the data set. There is a threshold value of R s above 
which the identified cluster members are highly contaminated 
by field stars and the effectiveness of membership determination 
is relatively small. Thus, care must be taken when applying the 
Sanders' method (just by itself or as part of a more extensive 
procedure) especially when we do not have reliable information 
about the real cluster radius and/or when the sampling radius is 
larger the cluster radius. If this type of effects is not taken into 
consideration in automated data analysis then significant biases 
may arise in the derived cluster parameters. The optimal sam- 
pling radius can be estimated by plotting the number of clus- 
ter members and/or the fraction of members as a function of the 
sampling radius. Moreover, this type of analysis can also be used 
as an objective procedure that can be applied systematically to 
determine cluster radii. 
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